25 research outputs found

    Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs

    Get PDF
    We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants

    A global view of the nonprotein-coding transcriptome in Plasmodium falciparum

    Get PDF
    Nonprotein-coding RNAs (npcRNAs) represent an important class of regulatory molecules that act in many cellular pathways. Here, we describe the experimental identification and validation of the small npcRNA transcriptome of the human malaria parasite Plasmodium falciparum. We identified 630 novel npcRNA candidates. Based on sequence and structural motifs, 43 of them belong to the C/D and H/ACA-box subclasses of small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs). We further observed the exonization of a functional H/ACA snoRNA gene, which might contribute to the regulation of ribosomal protein L7a gene expression. Some of the small npcRNA candidates are from telomeric and subtelomeric repetitive regions, suggesting their potential involvement in maintaining telomeric integrity and subtelomeric gene silencing. We also detected 328 cis-encoded antisense npcRNAs (asRNAs) complementary to P. falciparum protein-coding genes of a wide range of biochemical pathways, including determinants of virulence and pathology. All cis-encoded asRNA genes tested exhibit lifecycle-specific expression profiles. For all but one of the respective sense–antisense pairs, we deduced concordant patterns of expression. Our findings have important implications for a better understanding of gene regulatory mechanisms in P. falciparum, revealing an extended and sophisticated npcRNA network that may control the expression of housekeeping genes and virulence factors

    Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

    Get PDF
    The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

    Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

    Get PDF
    publication en ligne. Article dans revue scientifique avec comité de lecture. nationale.National audienceThe human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

    CAFTAN: a tool for fast mapping, and quality assessment of cDNAs

    No full text
    Abstract Background The German cDNA Consortium has been cloning full length cDNAs and continued with their exploitation in protein localization experiments and cellular assays. However, the efficient use of large cDNA resources requires the development of strategies that are capable of a speedy selection of truly useful cDNAs from biological and experimental noise. To this end we have developed a new high-throughput analysis tool, CAFTAN, which simplifies these efforts and thus fills the gap between large-scale cDNA collections and their systematic annotation and application in functional genomics. Results CAFTAN is built around the mapping of cDNAs to the genome assembly, and the subsequent analysis of their genomic context. It uses sequence features like the presence and type of PolyA signals, inner and flanking repeats, the GC-content, splice site types, etc. All these features are evaluated in individual tests and classify cDNAs according to their sequence quality and likelihood to have been generated from fully processed mRNAs. Additionally, CAFTAN compares the coordinates of mapped cDNAs with the genomic coordinates of reference sets from public available resources (e.g., VEGA, ENSEMBL). This provides detailed information about overlapping exons and the structural classification of cDNAs with respect to the reference set of splice variants. The evaluation of CAFTAN showed that is able to correctly classify more than 85% of 5950 selected "known protein-coding" VEGA cDNAs as high quality multi- or single-exon. It identified as good 80.6 % of the single exon cDNAs and 85 % of the multiple exon cDNAs. The program is written in Perl and in a modular way, allowing the adoption of this strategy to other tasks like EST-annotation, or to extend it by adding new classification rules and new organism databases as they become available. We think that it is a very useful program for the annotation and research of unfinished genomes. Conclusion CAFTAN is a high-throughput sequence analysis tool, which performs a fast and reliable quality prediction of cDNAs. Several thousands of cDNAs can be analyzed in a short time, giving the curator/scientist a first quick overview about the quality and the already existing annotation of a set of cDNAs. It supports the rejection of low quality cDNAs and helps in the selection of likely novel splice variants, and/or completely novel transcripts for new experiments.</p

    Dependence of Intracellular and Exosomal microRNAs on Viral <i>E6/E7</i> Oncogene Expression in HPV-positive Tumor Cells

    No full text
    <div><p>Specific types of human papillomaviruses (HPVs) cause cervical cancer. Cervical cancers exhibit aberrant cellular microRNA (miRNA) expression patterns. By genome-wide analyses, we investigate whether the intracellular and exosomal miRNA compositions of HPV-positive cancer cells are dependent on endogenous <i>E6/E7</i> oncogene expression. Deep sequencing studies combined with qRT-PCR analyses show that <i>E6/E7</i> silencing significantly affects ten of the 52 most abundant intracellular miRNAs in HPV18-positive HeLa cells, downregulating miR-17-5p, miR-186-5p, miR-378a-3p, miR-378f, miR-629-5p and miR-7-5p, and upregulating miR-143-3p, miR-23a-3p, miR-23b-3p and miR-27b-3p. The effects of <i>E6/E7</i> silencing on miRNA levels are mainly not dependent on p53 and similarly observed in HPV16-positive SiHa cells. The <i>E6/E7</i>-regulated miRNAs are enriched for species involved in the control of cell proliferation, senescence and apoptosis, suggesting that they contribute to the growth of HPV-positive cancer cells. Consistently, we show that sustained <i>E6/E7</i> expression is required to maintain the intracellular levels of members of the miR-17~92 cluster, which reduce expression of the anti-proliferative <i>p21</i> gene in HPV-positive cancer cells. In exosomes secreted by HeLa cells, a distinct seven-miRNA-signature was identified among the most abundant miRNAs, with significant downregulation of let-7d-5p, miR-20a-5p, miR-378a-3p, miR-423-3p, miR-7-5p, miR-92a-3p and upregulation of miR-21-5p, upon <i>E6/E7</i> silencing. Several of the <i>E6/E7</i>-dependent exosomal miRNAs have also been linked to the control of cell proliferation and apoptosis. This study represents the first global analysis of intracellular and exosomal miRNAs and shows that viral oncogene expression affects the abundance of multiple miRNAs likely contributing to the <i>E6/E7</i>-dependent growth of HPV-positive cancer cells.</p></div

    Inhibition of endogenous HPV16 <i>E6/E7</i> expression: Effects on selected intracellular miRNAs.

    No full text
    <p><b>(A)</b> Immunoblot analysis of HPV16 E7, HPV16 E6, p53 and p21 protein levels, 72 h after transfection of SiHa cells with si16E6/E7 or control siRNA (siContr-1), or upon mock treatment. α-Tubulin: loading control. <b>(B)</b> qRT-PCR analyses of ten selected cellular miRNAs, 72 h after transfection of SiHa cells with si16E6/E7 or siContr-1. Cellular miRNA levels were normalized to the snRNA <i>RNU6–2</i> and calculated relative to siContr-1 (log<sub>2</sub> display). Dashed lines: 1.5-fold up- or downregulation (log<sub>2</sub>(1.5) = 0.585). Data represent mean ± SEM (n = 3). Asterisks indicate statistically significant differences (p ≤ 0.05 (*) and p ≤ 0.01 (**)).</p

    Effects of the p53 status on the <i>E6/E7</i>-dependent modulation of intracellular miRNAs.

    No full text
    <p><b>(A)</b> qRT-PCR analysis of HPV18 <i>E6/E7</i> (left panel) and <i>p21</i> (right panel) mRNA expression, 72 h after transfection of parental or “p53-null” HeLa cells with si18E6/E7, control siRNA (siContr-1), or upon mock treatment. mRNA levels were normalized to <i>ACTB</i> and calculated relative to the mock control (mock). Data represent mean ± SEM (n = 3). Asterisks above columns indicate statistically significant differences from siContr-1-treated cells (p ≤ 0.05 (*), p ≤ 0.001 (***)). <b>(B)</b> Immunoblot analysis of HPV18 E6, p53 and p21 protein levels, 72 h after transfection of parental or “p53-null” HeLa cells with si18E6/E7 or siContr-1, or upon mock treatment. α-Tubulin: loading control. <b>(C)</b> qRT-PCR analyses of selected cellular miRNAs, 72 h after transfection of parental or “p53-null” HeLa cells with si18E6/E7 or siContr-1. miR-34a-3p, positive control miRNA (p53-inducible). Cellular miRNA levels were normalized to snRNA <i>RNU6–2</i> and calculated relative to siContr-1 (log<sub>2</sub> display). Dashed lines: 1.5-fold up- or downregulation (log<sub>2</sub>(1.5) = 0.585). Data represent mean ± SEM (n = 3). Asterisks indicate statistically significant differences (p ≤ 0.05 (*), p ≤ 0.01 (**) and p ≤ 0.001 (***)).</p
    corecore